Molecular dynamics simulation of polymer helix formation using rigid-link methods 
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Molecular dynamics simulations are used to study structure formation in simple model polymer 
chains that are subject to excluded volume and torsional interactions. The changing conformations 
exhibited by chains of different lengths under gradual cooling are followed until each reaches a state 
from which no further change is possible. The interactions are chosen so that the true ground state 
is a helix, and a high proportion of simulation runs succeed in reaching this state; the fraction that 
manage to form defect-free helices is a function of both chain length and cooling rate. In order to 
demonstrate behavior analogous to the formation of protein tertiary structure, additional attractive 
interactions are introduced into the model, leading to the appearance of aligned, antiparallel helix 
pairs. The simulations employ a computational approach that deals directly with the internal 
coordinates in a recursive manner; this representation is able to maintain constant bond lengths 
and angles without the necessity of treating them as an algebraic constraint problem supplementary 
to the equations of motion. 
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PACS numbers: 87.15.Aa, 02.70.Ns, 45.40.Ln 



I. INTRODUCTION 



Polymers, because of their importance and complexity, 
have provided a longstanding challenge for computer sim- 
ulation. Over the years, the field has become fragmented, 
both in terms of the problems addressed and the method- 
ology employed. Broadly speaking, the kinds of system 
studied can be classified into distinct groups; there are 
biological heteropolymers, a category dominated by the 
proteins; homopolymers and block copolymers that in- 
clude a great variety of molecular types, from alkanes to 
plastics; and idealized polymer models used for elucidat- 
ing general principles such as the theta point, reptation, 
and multiphase behavior. The computational techniques 
span an equally broad range; they include molecular dy- 
namics (MD) simulation employing models that repre- 
sent the molecules at various levels of detail, ranging from 
fully atomic to highly reduced descriptions; Monte Carlo 
sampling of both continuum- and lattice-based systems, 
again with different levels of representation; and exact 
enumeration of small systems aimed at eliminating the 
sampling errors inherent in the other methods. While all 
three kinds of methodology provide important informa- 
tion about equilibrium behavior and, in a sense, amount 
to doing statistical mechanics numerically, it is only the 
MD approach that provides access to the dynamical and 
nonequilibrium aspects of the behavior; while it might 
be argued that Monte Carlo shares some of this capa- 
bility, the associated dynamics is artificial and entirely a 
consequence of the chosen stochastic sampling algorithm, 
and so bears little relationship to Newtonian dynamics. 
Lattice-based approaches, though offering a vastly re- 
duced configuration space, have the additional problem 
of the discreteness of the lattice on which the polymer is 
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embedded, and the consequent absence of gradual tran- 
sitions between different configurations. 

The inherent difficulty in polymer simulation is that 
the problem naturally embraces a broad range of 
timescales, ranging from very fast processes associ- 
ated with bond vibration, followed by the somewhat 
slower, highly localized conformational changes such as 
crankshaft motions, then the even slower aspects of reor- 
ganization such as the still relatively localized process of 
helix formation, and, finally, the typically extremely slow 
changes that lead to the emergence of tertiary structure 
characteristic of protein folding and to polymer diffusion 
in a concentrated solution. The timescales associated 
with this hierarchy of processes span a range consider- 
ably in excess of ten orders of magnitude, and so such 
systems are clearly not generally amenable to direct mod- 
eling, unless subjected to major simplification. Consid- 
erable effort has been invested in the design of models 
and simulation methods with the aim of alleviating this 
problem to at least some degree. 

One especially important application of polymer simu- 
lation is in the field of protein folding, e.g., § |, § j§ |; 
achieving an understanding of the mechanisms underly- 
ing this important process presents a major challenge to 
computational biochemistry. Protein modeling runs the 
gamut from, at one extreme, highly detailed molecular 
representations involving potentials derived from a mix- 
ture of theory and experiment, together with a solvent 
of individual water molecules, all solved by MD and an 
enormous amount of computational effort |6|, ML through 
highly simplified models also solved by MD |1, to yet 
even simpler models embedded in lattices with only a lim- 
ited number of degrees of freedom (DOFs) studied using 
a suitable Monte Carlo procedure and a greatly reduced 
investment in computing p| ; even complete enumeration 
of all conformations is sometimes feasible ||. While the 
manner in which the amino acid sequence of any given 
protein is able to determine its presumably unique spatial 
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FIG. 1: A well- formed helix in a chain of length 90; a goal of 
the simulations is to observe chains spontaneously collapsing 
into this state (the polymer is drawn as a tube whose radius 
is that of the monomers). 




FIG. 2: A randomly coiled chain of length 90; this configura- 
tion represents a typical state of the chain prior to the onset 
of folding. 



structure continues to be the subject of intense study, of 
no less importance is the question of the folding pathway 
- the preferred route (or routes) through multidimen- 
sional conformation space eventually terminating at the 
native state. While all the widely differing methodolo- 
gies enumerated above can be used for studying folded 
states, the collective dynamical processes that underlie 
folding really demand an approach based on MD. But, af- 
ter reaching this conclusion, there is a practical question 
of whether, even after substantial simplification, serious 
progress in understanding the mechanisms of folding can 
be achieved by computer simulation, owing to the di- 
versity of intrinsic timescales; while substantial advances 
have been made, a great deal remains to be done before 
this question is answered. 

The goal of the present paper is twofold. The first goal 
is a demonstration of a different perspective on the MD 
approach to studying protein folding. The most ambi- 
tious level of modeling is based on carefully constructed 
potential functions, often with a multitude of parame- 
ters; since the native conformation generally corresponds 
to the state of minimum free energy, establishing the de- 
tails of these interatomic interactions, including solvent 
effects, provides the foundation for such work. Determin- 
ing whether the known native state of a given protein is 
the one favored by energetic considerations is in itself a 



complex optimization task, but following the full dynam- 
ics over a sufficiently long period of time for the major 
structural changes that typify protein folding to occur 
verges on the impossible. The approach adopted here is 
just the opposite, and the question posed is the follow- 
ing: Given a known structural motif, such as the helix, 
and a simplified model of a polymer chain with a readily 
determined, unique ground state corresponding to this 
configuration, as in Fig. |l|, will the chain collapse into 
this state within a reasonable amount of computation 
time when allowed to move freely in space, as shown in 
Fig. ||, while subjected to gradual cooling? 

The most elementary of these organized structures is 
the helix, which, while being a prominent feature in many 
globular proteins, is only classified as a secondary struc- 
tural element (the primary structure being the amino 
acid sequence itself), and because of its homogeneous na- 
ture (except for the ends) it might be argued that being 
able to fold a helix is not really a significant step in learn- 
ing how to fold an entire protein. Therefore, another fold- 
ing problem considered here is one with a ground state 
formed from an antiparallel pair of helices. This, too, is 
a recognizable element in some proteins, and is unques- 
tionably classified as tertiary structure. 

The obvious extension of this approach, a subject for 
future exploration, is to design simple models for other 
structural motifs, in the hope of learning more about 
folding by examining the collapse pathways of these ide- 
alized models; some structures might fold more readily 
than others, in which case the steric and topological is- 
sues involved could be investigated; for some structures 
there might be recognizable intermediate states along the 
folding trajectory; some cases might reveal useful proper- 
ties that, when regarded as conformational (or reaction) 
"coordinates" , might serve in the design of other kinds of 
simplified models |9|] ; and finally, once the simple version 
has been found to have the correct behavior, the models 
could be enhanced by gradually incorporating features 
from more realistic representations, including specific in- 
teractions and structural details. This represents the mo- 
tivation for this kind of modeling approach. 

The second goal is methodological. Even when consid- 
ering the simplest of model polymers, in which, typically, 
all the molecular detail is absorbed into effective atoms 
located along the backbone chain (more so if this sim- 
plification is not made) there is a need to specify the in- 
ternal DOFs of the system. One possibility is to assume 
that adjacent atoms are connected by stiff springs repre- 
sented by a suitable potential function; in this case each 
atom has its full complement of three translational DOFs 
and, if these atoms are regarded as rigid particles rather 
than point masses, three rotational DOFs as well, ff the 
bond potentials are made sufficiently stiff to correspond 
to a typical real system, the ensuing high-frequency vi- 
brations impose a very small integration timestep, which 
runs contrary to the goal of efficiently simulating over 
long periods of time. 

It is, however, possible to introduce geometrical re- 
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strictions, such as strictly constant bond lengths, while 
retaining a soluble dynamical problem. This is done by 
introducing holonomic constraints and Lagrange multi- 
pliers into the equations of motion [jlo|| , and then solving 
a set of algebraic equations while integrating the differ- 
ential equations of motion. Two approaches have been 
developed for doing this; one involves initially solving the 
unconstrained equations of motion over a single timestep 
and then iteratively correcting the relative coordinates 
PI, |l2], and, optionally, also the relative velocities [ ji3| , 
using a relaxation procedure to ensure the constraints 
remain satisfied; the other tackles the problem by con- 
structing a matrix representing the contributions of the 
constraints which, in effect, must be inverted at each 
timestep Jl4|, and which is subject to gradual drift re- 
quiring regular correction. Similar geometric constraints 
can be introduced to maintain constant bond angles as 
well, since it is often a reasonable approximation to as- 
sume that the angles between consecutive bonds along 
the backbone (or elsewhere) are unvarying. Such geo- 
metrical constraints have proved extremely useful, given 
the nature of the excitations present in the system: fluc- 
tuations in bond lengths, and sometimes also angles, tend 
to be of relatively small amplitude and high frequency, 
so that freezing them out of the dynamics permits a sub- 
stantial increase in the allowed integration timestep. The 
amount of additional processing required for the con- 
straints depends on their number n c ; the dependence is 
typically 0(n c ) for the iterative approach, but for the 
matrix approach it is 0(n^.), making the latter unsuit- 
able for large problems. 

If bonds lengths and angles are fixed, the only remain- 
ing internal DOFs are the dihedral angles, each defined in 
terms of a rotation about an axis lying along a bond, and 
affecting the relative orientation of the pair of bonds on 
either side. For reasons shrouded in history, dealing with 
this problem has been perceived as difficult, as indeed 
it is, if the problem is not addressed in a suitable man- 
ner. A significant advance in the methodology for dealing 
with dynamical problems involving internal coordinates 
occurred some years ago in the robotics field |lf| 0, but 
with only the occasional exception, e.g., Q, it appears 
to have gone unappreciated by the polymer simulation 
community at large. Because of the importance of this 
technique, the goal of which is to deal directly and eco- 
nomically with the internal DOFs, and since there is no 
reason why it should not be capable of replacing the var- 
ious constraint-based approaches for most applications, 
a detailed treatment of the underlying theory is included 
in the paper. 

This approach to the dynamics of linked bodies also 
requires solving the dynamics of individual rigid bodies. 
An alternative, recently described means |fT{| of numeri- 
cally dealing with the rigid-body equations of motion is 
discussed briefly; the method is based on rotation ma- 
trices, rather than on quaternions (or even Euler angles) 
that are generally used. The present formulation dif- 
fers slightly from the original in regard to the reference 



frame in which the computations are carried out. The 
use of rotation matrices offers improved numerical stabil- 
ity, and since the method belongs to the leapfrog family 
of integrators, it means that simple leapfrog integration 
techniques can be used for the entire set of dynamical 
equations appearing in the problem. 

II. LINKED-BODY DYNAMICS 
A. Chain coordinates 

Consider a linear polymer chain whose monomers are 
joined by rigid bonds. In the discussion that follows, 
the terms "monomer", "atom", "site" and "joint" will 
be used interchangeably, as appropriate to the context, 
likewise "link" and "bond". Bond lengths and angles 
are constant. If each torsional DOF is regarded as a 
mechanical joint associated with the site at one end of the 
link, with just a single rotational DOF, then the system 
is analogous to a basic problem in the field of robotic 
manipulators Jll| [ijj . 

The chain configuration is defined by the site positions 
{r^}, and if the bond vectors between adjacent sites are 
{bfc} then r^.+i — rk + bk- The internal configuration 
of the chain can be specified by a set of bond rotation 
matrices {.Rfe}. The transformation between the local 
coordinate frames attached to bonds k — 1 and k (k > 1) 
involves a rotation through the bond angle ak about the 
axis 5Ejt_x, where cosafc = bk-i-bk, followed by arotation 
through the dihedral angle 9 k about the joint axis Zk-i- 
The matrix (actually its transpose) corresponding to this 
rotation is 

(cos 9k — sin 9k cos ak sin 9k sin a k \ 
sin9k cos cos afc - cos9 k smak , 
sinafc cosafc / 

(1) 

so that 

R l = ^ R o,i •'•- R fc-i,fc' ( 2 ) 

where Rq represents the orientation of the initial site and 
bond, and 

rk+i = Tfe + \bk\Rlz. (3) 

In the present case, {|bfc|} and {ak} are all constant, 
so that the only internal DOFs are those associated with 
{9k}- Define hk to be the rotation axis of the joint be- 
tween bonds k— 1 and k that is fixed in the frame of bond 
k — 1; in the present case hk = Zk-i- Insofar as indexing 
is concerned, there are n r internal rotational joints (with 
labels 1, . . . ,n r ), = n r + l bonds (0, . . . ,n r ) and n r + 2 
sites (0, . ..,n r + 1). In order to completely specify the 
chain configuration, an additional joint is attached to the 
k = site, with three translational and three rotational 
DOFs (conceptually equivalent to a telescopic ball-and- 
socket joint); this joint is included in the formalism but 
will, eventually, be treated separately. 
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B. Kinematic and dynamic relations 

If v k and u3k are the linear and angular velocities of 
site k, then the velocities and accelerations of adjacent 
sites are related by 



<jj k 


= Wfe_l 


+ h k 9 k 


(4) 


Vk 


= Vk-l - 


■f X b k -l 


(5) 


LO k 


= Uk-l 


+ h k e k + w fc _i x h k e k 


(6) 


Vk 


= Vk-l - 


■f ti)fc_i x 








i x x bfe_i) 


(7) 



where 1 < k < n r . While the mass elements of the chain 
are normally identified with the sites, here it is helpful to 
associate them with the bonds; if r k + c k is the location 
of the center-of-mass of the atoms attached to bond k, 
then the center-of-mass acceleration of the bond is 



V k + UJ k x c k + iv k x (u> k x c k ) 



(8) 



If fk and n k are the force and torque acting on bond k 
across joint k, then the equations of motion are 



X k 0J k + uj k x (T k u) k ) 



m k v k = 



n k - n k+ i - c k x f k 
-(b k -c k ) x f k+ i +n e k (9) 
fk - fk+i + fk, (io) 



where f k and n k are the externally applied force and 
torque; m k and I k are the mass and moment of inertia 
of (the atoms associated with) the bond, the latter ex- 
pressed in a space-fixed frame and relative to the center 
of mass of the bond. It is often convenient when deal- 
ing with rigid bodies to work in a center-of-mass frame 
Jl0{ ; this is not the case here, and all vector components 
are expressed in the space-fixed coordinate frame. Rear- 
range the terms of Eqs. (g) and ( [To| ) to obtain relations 
between torques and forces on adjacent bonds, 

n k = n k+ i + b k x fk+i + m k c k x v c k 

+X k u; k + iv k x {X k uj k ) - n e k - c fe x fl (11) 
fk = fk+i+m k v%- f e k , (12) 



and define the torque 



t k — h k ■ n k 



(13) 



that acts along the axis h k at joint k and corresponds 
to the torsional interaction due to a twist around bond 
jfc-1. 



C. Spatial operator formulation 

Equations can be expressed more concisely in 

terms of 6-component "spatial" vectors that combine the 



translational and rotational quantities. It is also conve- 
nient to represent certain vectors by means of antisym- 
metric matrices of form 




so that uv 



v k 



u>k 

Vk 



u = 



u x v. The resulting equations are 
I \ / u; fe _! 

-6fc_i i J \ vk-i 

I \ / u> k -i 
-bk-i I / V 

u; fe _i x h k 9 k 
a; fe _i x (u>fc_i x bk-i) 





h k 




(14) 



(16) 



H£W fc 



X, 



or, equivalently, 

V fe = <^fc_ 1)fc Vfc- 

Afe = <£Li,fc A fc- 
where and are examples of spatial vectors, and 

I 
bk-i I 



J k-l,k 



(17) 
(18) 



(19) 



The 6x6 matrices 4>k-i k an d 4>k,k+i (later) appear 
throughout the derivation, and their role is to propa- 
gate kinematic and dynamic information between joints. 
Several other new variables have been used: 



HT 



hk 




(20) 



is a 6-component joint axis vector (in the more general 
case of a joint with d DOFs, which the formalism is ca- 
pable of handling, would become a 6 x d matrix), 



X fc = 



&k-i o 
u>k-i 



h k 6 k 
v k - v k -i 



(21) 



is a 6-component spatial vector containing the remain- 
ing acceleration terms of the current site, and W k = 9 k . 
When used in vectors and matrices, I and denote unit 
and zero block submatrices of the implied size. The 6- 
component vectors, and most of the associated matrices, 
are shown in block capitals (to retain some similarity 
with pq| , cf), ip and M are also used); no other special 
notation is needed since the variable types will be obvious 
from the context. 

In a similar way, Eqs. (|ll])-(|l^) can be rewritten as 



fk 



I b k 
I 



fk+l 



m k c k x v% + T k dj k +uj k x (I k u3 k 

m k v c k 

n % + c k x fk ' 
fk 



h k 




n k 
fk 



(22) 
(23) 
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or, equivalently, 

F fe = (f> k ,k + iF k+1 + M k A k + Y fc (24) 
T fe = H fc F fc . (25) 

Here Eq. (JsJ) has been used, and 

fl k - mk c k c k m k c k \ 

\ -m k c k m k l j y ' 

is the symmetric, 6x6 mass matrix; the 6-component 
vector 

Y — ( ^k{l k - m k c k c k )uj k \ ( n e k + c k x /| \ 
fc V m kU k u; k c k ) \ fk ) 

contains the remaining force contributions. The identity 
c fe x (uj k x (u>fc x Cfc)) = -Wfc x (c fe x (c fe x a>fc)) was 
used in obtaining these expressions. In order to use the 
recurrence relations for V kl A k and F k , the velocity and 
acceleration of the initial site, Vo and Ao, must be pro- 
vided, while the force associated with the site at the end 
of the final bond, F nr +i, is zero, since there is no joint 
associated with that site. 

The purpose of the recurrence relations in Eqs. JlS| ) 
and ( p4[ ) is to provide expressions for {W^}, which, to- 
gether with A , and assuming all the forces acting on 
the sites are known, can be integrated to solve for the 
chain dynamics; this is actually the opposite of the typ- 
ical robotics problem, in which the goal is to determine 
the forces required to produce a particular robot arm 
trajectory. 

D. Stacked operators 

Equations @, @, (§§) and @ can be rewritten in 



condensed, "stacked" form 

V = r V + H r W (28) 

A = T A + H T W + X (29) 

F = 0F + MA + Y (30) 

T = HF (31) 



that combines the entire set of k values. A quantity such 
as V containing all the V k values for the chain is also re- 
ferred to as a spatial vector, while, for example, the block 
matrix <j> containing all the 4> kyk +i matrices is a spatial 
operator. The stacked formalism leads to a concise and 
elegant formulation of the problem, free from inundation 
by indices as is often the case in the robotics literature, 
e.g.,©- 

The spatial operator approach was originally devel- 
oped for the case of a fixed initial bond Q - the base 
in the example of a robot arm - for which Vo = 0, so 
that W = col(#i, . . . , 9, lr ) is a vector with just n r com- 
ponents, and the other vectors and matrices are sized 
accordingly. In order to remove the fixed-base restriction 



p2|] , six extra DOFs are added to the problem by re- 
defining W = col(Vo, 0i, ■ ■ ■ , nr ) as a vector with n r + 6 
components; likewise for W. The size of the original 
6n r x n r block-diagonal matrix H = diag(Hi, . . . , H nj .) is 
increased to 6 (n r + 1 ) x (n r + 6) by including an extra 6x6 
block Ho = I, so that now H = diag(I, Hi, ... , H„ r ). The 
block-diagonal matrix M is of size 6(n r + 1) x 6(n r + 1); 
(j) has the same size, and its only non-zero blocks are 
those to the immediate right of the diagonal, namely 
{001, . . ■ , 0n r -i,n r }- Vectors V, A, F, X, and Y, all have 
6(n r + 1) components, e.g., V = col(Vo, . . . , V„ r ), and T 
is organized in the same way as W, with n r + 6 compo- 
nents; To = because the special k = joint exerts no 
torque. (Note that index order has been reversed from 
the original to make it more suitable for polymer use, 
and, for convenience, other aspects of the notation have 
been altered or simplified.) 

The next step is to define the matrix 

* = (i-^r\ (32) 

which is also used in the alternative form, $ = $<f> + I; 
because (j) nr+1 = 0, Eq. (^) is equivalent to $ = I + 4> + 
(f> 2 + • • • + 4> nr , which is an upper-triangular block matrix 
whose elements, each a 6 x 6 matrix, are 

f I j = i 

$ij = < 0m+i j = i + l (33) 

[ <f>i,i+l ■ ■ ■ <Aj-1j j > i + 1- 
Then, in terms of Eqs. (H)-© reduce to 

V = $ T H T W (34) 

A = $ r (H r W + X) (35) 

T = A^W + H$(M$ T X + Y), (36) 

where 

M = H$M$ T H r . (37) 

While M is a sparse, 6(n r +l) x 6(n r + l) block-diagonal 
matrix, M. is only of size (n r + 6) x (n r +6), but, although 
it is typically much smaller, it is fully populated. In prin- 
ciple Eq. ( p6[) can be numerically integrated to obtain W, 
and this is one of the approaches actually used in solving 
the problem, but the computational effort required for 
evaluating Ai^ 1 at each timestep to obtain W is of or- 
der 0((n r + 6) 3 ); for this reason such an approach is not 
practical for any but the shortest of chains. The alterna- 
tive method, described below, requires a computational 
effort of order 0(n r ), together with what amounts to the 
inversion of a 6 x 6 matrix; clearly this will prove to be a 
far more efficient approach, even for relatively small n r . 

E. Inversion of the mass matrix 

As a preliminary step in obtaining an explicit expres- 
sion for Mr 1 define |l6| the 6x6 matrix P k in terms of 
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M fe as 



>k,k+ 



AI - G fc+ iH fc+1 )P fc+1 ^ ifc+1 +M k . (38) 



In Eq. (38), 



G fc = P.H-D- 1 



HfePfeHl 



k ' 



(39) 
(40) 



where, for joints with a single DOF, Gfc is a 6-component 
vector and D& is a nonzero scalar; note also that is 
symmetric. (The motivation for introducing P/. is ex- 
plained in |16| and derives from the formal similarity of 
these equations with those used in the completely unre- 
lated field of linear filtering.) Also define 



ipk,k+i = 0fc,fc+i(I — Gfe + iHfe + i) 



(41) 



and substitute this in Eq. (p8[). The stacked versions of 
Eqs. are 



P = ^P(f) T + M 

G = PIFD- 1 

D = HPH T 

ip = 4>(l - GH). 



(42) 
(43) 
(44) 
(45) 



Matrices P and ip are of size 6(n r + 1) x 6(n r + 1), and 
G is (n r + 6) x 6(n r + 1) and block-diagonal (thus the 
product Gfc+iHfc+i is square). Matrix D is of size (n r + 
6) x (n r + 6); its first 6x6 diagonal block corresponds 
to Do, and the remaining n r diagonal elements are the 
scalars D^. From Eqs. (M2) and (45), 



0GHP0 T , 



(46) 



M = P - <pP<p 7 
and so, by using Eq. (|32]), 
$M$ T = P+$0P+P</- T $ T +$</-PH T D- 1 HP0 T $ T . (47) 

PH T from 



in (37), then use GD 



Substitute Eq. (\ 

(||), together with (|4j),To obtain 

M = HPH T + H$</PH T + HP</> T $ T H T 
+H$0PH T D- 1 HP0 T $ T H T 
= (I + H$0G)D(I + H$< ? !)G) T . (48) 

This alternative factorization of M. is aproduct of three 
(n r +6) x (n r + 6) matrices, unlike Eq. ( pl\ ) which involves 
nonsquare matrices. 

It is now a straightforward matter to invert M. . Use a 
special case of the Woodbury formula for the inverse of 
a matrix ||, (I + Q1Q2)" 1 = I - Qi(I + QaQi^Qa to 
write 



(I + H^G)" 1 = I - H$(I + 0GH$)"VG. 



(49) 



By analogy with Eq. (|2|) for $, define ^ = (I - ip) 1 ; 
then from Eqs. @ and (H), 



0GH, 



(50) 



so that (I + H^G)" 1 = I - H*0G. Thus the inverse of 
Eq. flU) is 



M' 1 = (I - H*0G) T D" 1 (I - IW0G), 

and so, from Eq. (|36|), 

W = (I-H*0G) T D" 1 (I-H^G) 
x[T-H$(M$ T X + Y)] 
= (I-H*0G) T D- 1 

x[T - RV((pGT + M$ T X + Y)], 



(51) 



(52) 



where Eq. ( p0[ ) is used in simplifying H(I — tycpGH)® 
H^. To eliminate first rewrite Eq. (52) as 



(I+H$0G) T W = D" 1 [T-H*(</>GT+M$ T X+Y)]. (53) 

Next, use Eq. @ with (§|) to get 

*M$ T = *P(0 T $ T +1) - 4<#> T $ r 

= *P + P< ? !) T $ T . (54) 

Then, using the transpose of Eq. ( |43] ) , it follows that 

(I + H$c£G) T W = D^E - G T T $ T X, (55) 

in which the force-like quantities 



E = T-HZ 

Z = *(</>GT + PX + Y) 



have been defined. Rearranging Eq . 
expression for A given in Eq. ( pq ) leads to 

W 



(56) 
(57) 

and using the 



D _1 E - G T r $ T (H T W + X) 
D _1 E — G T <ft T A. (58) 



It is also possible to eliminate ^ from Eq. (|57|) by sub- 
stituting T from @ to get (I - f0GH)Z = #(0GE + 
PX + Y) , and then using Eq. (|50|) to obtain 



Z = $(<^GE + PX + Y). 



(59) 



Explicit forms for the new recurrence relations embod- 
ied in Eqs. ( |58|) and (||^) are obtained by using Eq. (|3^ ) 
and reintroducing the k indices - 

Z fe = M+ i(Z fc+1 +G fc+ iE fc+1 )+P fc X fc +Y fc (6O) 



W fc = D-^fc-G 



fc<Pfc-l,/c A fc-l 



(61) 



These recurrence relations are used in opposite Re- 
directions; they succeed in providing the required results 
without the need for explicit evaluation of the matrix in- 
verse Mr 1 as implied by Eq. (Q). It is for this reason 
that the method has not been referred to as an "inverse 
matrix method" , a term sometimes seen in the litera- 
ture, but rather a "rigid link" method, a far more apt 
descriptor. 
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The expressions given here describe the entire chain, 
but, provided the end joints are handled correctly, these 
results can be used for linear segments that form part 
of a larger assembly, allowing more complicated tree-like 
structures to be treated. Furthermore, while the above 
formulation deals with the simplest case of a linear chain 
with a single torsional DOF per joint, it is readily ex- 
tended to more complex joints, enabling, for example, 
the constant bond-angle condition to be eliminated by 
allowing two DOFs at each joint (an alternative would 
be to decompose an individual joint into two coincident 
joints each with a single DOF). 

III. SIMULATION TECHNIQUES 

A. Linked-chain equations of motion 

The recurrence relations used to propagate velocities, 
forces, and accelerations along the chain are as follows: 
The (translational and rotational) velocities V& are ob- 
tained by starting with Vo and iterating Eq. dl7j), 

Vfc = $_i,fcVfc-i+H£W fc , k = l,...,n r . (62) 

The forces (and torques), as represented by Efc, together 
with the matrices Dfc and Gfc, are obtained by iterating 
Eqs. ( p8| ) and (|60|). For computational convenience, new 
quantities A' k and Z' fc are introduced; then, starting with 



n r + l — 


and Z' nr+1 = 0, 




Pfc = 


fc+i (I — Gfc + iHfc + i) 






xP H10L + l + M fe 




D fc = 


HfcPfcH^ 




G k = 




> 


z fc = 


<Afc,fc+i Z' fe+1 + PfcXfc + Yfc 




Efc = 


Tfc — HfcZfc 




Z'fc = 


Zfc + GfcEfc 





( 63 ) 

Finally, the values of Wfc (or Ok) are determined by start- 
ing with Ao (its evaluation is discussed below) , and iter- 
ating Eqs. @ and (§l|), 

A 'fc = <^fc-i,fc A fc-i ) 

W fe = D^Efc - G£A' fc \ k = l,...,n r . (64) 
Afc = A' fe + H£W fe + X fe J 

These recurrence relations, which are readily transformed 
into a suitable computer program, imply a series of op- 
erations (multiplications and additions) involving 6x6 
matrices and 6-component vectors, but the total compu- 
tational effort is only of order 0(n r ). 

Recall that the k = joint has six DOFs, and also that 
H = I, Xo = 0, and Wo = A . Now, because A_i = 0, 
it follows from Eq. ( |64| ) that Ao = D " 1 Eo, and since 
To = 0, 

D A = -Z , (65) 



where both Do and Zo have already been determined 
(above). Thus Ao can be evaluated numerically by solv- 
ing the set of six linear equations contained in Eq. ( |65| ) 
using the standard LU method ^3); the computational 
effort required for this initial joint is fixed and indepen- 
dent of n r . 

B. Leapfrog integration and rigid-body equations 

The familiar leapfrog method for integrating the MD 
translational equations of motion - which is algebraically 
equivalent to the Verlet method Q - is usually expressed 
in a form where the coordinates and velocities are eval- 
uated at alternate half-timesteps Jl5| . This minor in- 
convenience can be avoided by using a slightly modified 
form that breaks the integration procedure for a single 
timestep into two parts: Prior to computing the latest 
acceleration (a) values, update the velocities (v) by a 
half timestep using the previous accelerations, and then 
update the coordinates (r) by a full timestep using these 
intermediate velocity values, 

v(t + h/2) = v(t) + (h/2)a(t) (66) 
r(t + h) = r(t) + hv(t + h/2). (67) 

In the case of the polymer chain, this procedure is ap- 
plied to the translation coordinates of the k = site and 
(in scalar form) to each of the dihedral angles 9k', the 
treatment of the angular coordinates associated with the 
k = site, below, employs a related approach for dealing 
with the rotational equations. Next, use the new coor- 
dinates (and velocities if needed) to compute the latest 
acceleration values, then update the velocities over the 
second half timestep, 

v(t + h) = v(t + h/2) + (h/2)a(t + h). (68) 

In the linked-chain formulation, the initial bond of the 
chain is treated as a rigid body; the influence of the rest 
of the chain on it has already been taken into account and 
is contained in the force and torque transmitted through 
the first internal joint. There are a number of ways of de- 
scribing the orientation of a rigid body jlO) : Euler angles 
have proved very useful for analytic purposes because of 
their intuitive nature, but owing to a potentially singu- 
lar matrix that appears in the equations of motion they 
are not the preferred method for dealing with numerical 
problems. Quaternions have achieved popularity because 
of their singularity-free nature, but their normalization 
must be preserved against a small but persistent numer- 
ical drift |H| ^H). A more recently proposed alternative 
is to regard the complete rotation matrix as the dynam- 
ical variable; this is the representation that will be used 
here, since the integration scheme - which is based 
on operator splitting and maintains time-reversibility - 
is just another instance of the leapfrog method. 

In the original description Jl9), vector components 
were expressed in the principal-axis frame of the body. 
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Since the chain dynamical problem as a whole is solved 
in the space-fixed frame, the corresponding form of the 
rotational equations will be described here. If R denotes 
the rotation matrix of a rigid body, then the first part of 
the leapfrog integration step consists of a half-timestep 
update of the angular velocities, 



w(f + h/2) = uj(t) + (h/2)a(t), 



(69) 



where a = u>, followed by a full-timestep update of R 
using a symmetric product of matrices describing a series 
of small partial rotations, 



R T (t + h) = E7k U 2 U 3 U 2 Ui R T {t), 



(70) 



where, for convenience, the transpose of R is treated. 
Note that for the linked chain, the rigid body is associ- 
ated with the k = joint, so that R = R®. Each of the 
matrices 

C7i = U x {u x h/2), U 2 = U y (uj y h/2), U 3 = U z (cu z h) 

(71) 

describes a rotation about a single axis and is evaluated 
in the space-fixed frame. For small angles, they can be 
approximated in a way that preserves orthogonality, e.g., 



1 

U x (9) = ( cos6> -sinf 
sin 8 cos ( 











n 1-0-74 -e 

1/4, 



1+974 1+074 



The second part of the leapfrog step is 



(72) 



u>(t + h) = uj{t + h/2) + {h/2)a(t + h). (73) 

In the case of a single rigid body, the angular acceleration 
is determined from the torque t, namely a(t + h) = 
I^ 1 r(t + h), whereas for the linked chain this treatment 
is only required for the k = joint, and a is obtained 
by solving Eq. (p5|); the reason rigid bodies are usually 
treated in the body-fixed principal-axes frame is to ensure 
the diagonality of X, a consideration that is not relevant 
here. 

The complete procedure for a single timestep can be 
summarized as the following sequence of operations: in- 
tegrate (first part) to obtain base velocities and co- 
ordinates, and joint angular velocities and angles; de- 
termine site velocities, Eq. (p2|); evaluate site coordi- 
nates, Eqs. compute external forces and torques, 
and other necessar y q uantities; determine joint forces, 
Eq. (fSSj); solve Eq. flGq ) for the base acceleration; deter- 
mine joint accelerations, Eq. (|64[); integrate (second part) 
to obtain base velocities and joint angular velocities. 



atom groups) located at the chain sites. Here a simple 
soft-sphere repulsion, based on the Lennard- Jones poten- 
tial with a short-range cutoff, is all that is required: for 
a pair of atoms located at r$ and Vj , where ry = r j — rj , 
and Tij = \rij\, the potential is 



< r c 
> r. 



(74) 



with a cutoff r c = 2 1 / 6 cr (nearby pairs of atoms that 
are prevented from approaching too closely because of 
geometrical restrictions need not be considered). Should 
a pairwise attraction between particular pairs of distant 
chain atoms be required (as will be the case later on), 
it can be obtained from Eq. (|74|) by simply increasing 
r c . The pair forces derived from this potential, and their 
associated torques, contribute to /| and n| in Eqs. (j|) 
and @. 

The torsional potential associated with the dihedral 
angles 9k has the simple form 



u t (9k) 



-u k cos(V k 



3 (0) 



), 



(75) 



where 0^ is the dihedral angle that produces a ground 
state having the correct helical twist, and u k is the inter- 
action strength. The torque appearing in Eq. (|l3|) is 



t k = u k sin(6» fc - 



(76) 



a result whose simplicity stands in sharp contrast to the 
intricate vector algebra associated with torque calcula- 
tions when working in Cartesian coordinates fTq] . 

For the chains considered here it is assumed all |6fe| = 
b, a k = a, 6^ = 6*(°) and, except for the later twin-helix 
studies where selected u k — 0, all u k = u^°\ Since the 
torsion also acts at the first internal joint, it is necessary 
to add an extra site and bond to the chain (effectively 
with an index "—1") to make this torsion term mean- 
ingful; the first three sites of the modified chain form a 
rigid unit (the extra bond does not alter the preceding 
analysis) and the chain length is increased by unity. 

A spherical mass element (with a finite moment of in- 
ertia about its own center of mass) is associated with 
each site; for bonds with k > 0, the mass is attached to 
the far (k + 1 site) of the bond, while the k — bond, 
as explained above, has three masses associated with it. 
The components of the inertia tensor in Eqs. ( p6| ) and 
@ are 

(T, V— / Sftefe m «( r K ~ r Ki) i=j ( 77 \ 

where the sum (or volume integral) is over all mass ele- 
ments k fixed to bond k, and coordinates are relative to 
the center of mass of each bond in the space-fixed frame. 



C. Polymer chain model 

Two kinds of interactions are required in this model - 
excluded volume and torsion. The former is provided by 
a pair interaction that prevents overlap of the atoms (or 



D. Order parameter 

While the appearance of an ordered helical structure, 
even one with the occasional defect, is easily recognized 
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visually, in order to facilitate statistical analysis of the 
behavior it is important to be able to quantify the degree 
of order present in the chain. Let dk — b/c-i x bfc, then 

1 n r 

S=-J2d k (78) 

defines an order parameter that measures the long-range 
order present in the folded structure based on the orien- 
tation of the helical turns; for a single, well-formed helix, 
S should have a value close to unity. A slightly mod- 
ified version of S will be introduced later for studying 
twin-helix structures. 

This definition of S is particularly useful for detecting 
structures consisting of two or more helical domains with 
axes aligned in different directions due to a localized de- 
fect of the type seen in helically wound telephone and 
electrical cords. Since the correct helicity (or "handed- 
ness") is built into the interactions, it is unlikely that seg- 
ments of opposite helicity will independently nucleate at 
separate locations but, as the chain collapses, individual 
turns with the wrong twist can become trapped in the 
structure. These defects are capable of traveling along 
the chain, but this is a slow process, and the direction of 
motion is random unless close to the chain end. There 
are instances where the definition of S in Eq. ( ]7q ) can 
give an incomplete picture; if a wrong turn occurs very 
close to the chain end, its effect on S will be minimal, and 
even a perfectly formed helix is subject to low frequency 
bending motion. Other order parameters can be defined 
that are of a more short-range nature; for example, a 
simple count of the number of pairs of chain sites lying 
within a specified range (i.e., the number of "contacts" 
between adjacent turns of the helix) divided by the max- 
imum possible value, but for long chains the tolerance in 
the threshold required to accommodate thermal fluctua- 
tions might allow significant changes in the helical-axis 
direction to go undetected. 

IV. RESULTS 
A. Simulation details 

The most prominently recognizable structural motif 
found in proteins is the (a-) helix. A helix, because of its 
uniformity along the longitudinal axis, is a particularly 
simple structure to specify, and both Monte Carlo and 
MD helix-folding simulations based on the complex po- 
tentials designed for protein modeling have been carried 
out, e.g., [Esl Since the complexity of these poten- 
tials contributes little to a basic understanding of generic 
folding phenomena, the present simulations are based on 
the much simpler model and potentials described previ- 
ously. Indeed, an analogous approach has been employed 
experimentally pTj] in a study of helix formation in syn- 
thesized nonbiological chain molecules, where the inter- 
actions are simpler than in proteins (in particular, there 



are no hydrogen bonds). 

The importance of examining simple structures, such 
as the helix, is that the process by which ordered ar- 
rangements emerge from randomly coiled states is likely 
to capture something of the essence of real protein fold- 
ing, such as the cooperativity of the folding process, the 
role of nucleation sites, the degree to which folding is 
able to proceed to completion, and the steric and topo- 
logical effects of excluded volume. The key question, of 
course, is whether the intrinsic timescales of these pro- 
cesses are sufficiently small for simulation to be com- 
putationally feasible, an issue addressed by the results 
presented here. While the present model is admittedly 
a mere caricature of the detailed models normally em- 
ployed in studies of individual proteins, it has two unde- 
niable advantages, namely a known native ground state 
compatible with the interactions, and sufficiently modest 
computational requirements that MD simulation is able 
to encompass the time interval required for major confor- 
mational change. More complex protein structures also 
display certain common characteristics, and ought to be 
accessible to simulations of this type; it is, however, es- 
sential to eliminate any ambiguity from the ground state, 
something that nature itself has presumably achieved in 
the interests of efficiency and reliability. 

Each simulation run considers a single chain con- 
structed as described earlier. The absence of a solvent, 
apart from changing the timescales, should not alter the 
outcome; indeed many, if not most, protein simulations 
avoid introducing an explicit solvent for reasons of com- 
putational efficiency. The simulation is begun at a rela- 
tively high temperature, so that the kinetic energy is suf- 
ficiently large to surmount the torsional potential barri- 
ers. The initial chain configuration is a large loop extend- 
ing across the simulation cell, with a very slight helicity 
to prevent any overlap; initial dihedral angles are chosen 
so that locally, the conformation is almost a planar zigzag 
state. The joint angular velocities are assigned random 
values corresponding to the starting temperature, and 
memory of this initial state rapidly vanishes early in the 
simulation. The temperature is gradually reduced by a 
factor slightly less than unity at regular intervals until, 
towards the end of the run, very little kinetic energy re- 
mains in the system. The simulation region is bounded 
by hard, reflecting walls; while there are occasional wall 
collisions, this has little influence on the overall behavior. 
(The alternative would be to use periodic boundaries, 
which for a simulation cell not large enough to contain 
the chain in a fully-stretched state, would be subject to 
chain wraparound effects; while these are also unlikely 
to affect the overall behavior, they can prove visually 
confusing given the importance of computer-generated 
visualization in this work.) 

The gradual cooling that is imposed throughout the 
run plays several distinct roles. During the early stage it 
is used to drive the chain from a totally random state to 
one in which the torsional potential begins to have some 
influence over the dihedral angles. Then, as the temper- 
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ature is reduced further, an increasing degree of local or- 
der emerges and precursors to long-range order appear, 
either as a consequence of the merging of separate or- 
dered domains, or the spread of order from a nucleation 
region (or a combination of both processes); during this 
stage the imposed cooling performs a task normally the 
responsibility of the solvent, namely the removal of ex- 
cess potential energy as the chain evolves towards states 
of lower energy. Once the chain has reached a state con- 
sisting mainly of helical segments, possibly separated by 
small misfolded regions that have become trapped, the 
purpose of further temperature reduction is to gradually 
freeze out thermal fluctuations - without further major 
structural change - in order to allow evaluation of the 
long-range order parameter S (the measure of success of 
the folding process); the latter part of this cooling stage 
is not intended to imitate any real physical process. 

The simulations use standard, reduced MD units, in 
which all distances and energies are expressed in terms 
of the Lennard- Jones parameters a and e respectively; 
mass is expressed in terms of the monomer mass m and, 
consequently, the unit of time is \Jraa" 1 ft. Tempera- 
ture and energy are made numerically identical by set- 
ting the Boltzmann constant fee to unity. In terms of 
these units the parameters used in the runs are as fol- 
lows: The bond length b — 1.3, a value sufficiently short 
to prevent the chain crossing itself, the bond angle a and 
the preferred dihedral angle 0^ are chosen to produce 
helices with periodicity six, and the torsional potential 
strength = 5. In the studies of twin helices, the 



TABLE I: Details of helix folding runs discussed in the text. 



cutoff in the attractive interaction, based on Eq. ( |74[ ), 
occurs at r c = 2.2. The initial temperature is 4 (corre- 
sponding to a kinetic energy per DOF of 2) and the final 
temperature is 10 -3 ; temperature is reduced by rescal- 
ing all velocities and angular velocities by a factor Jt 
every 4000 timesteps, with fx = 0.95 or 0.97. The runs 
reported here are each of length 4-8 x 10 5 steps; the inte- 
gration timestep (in MD units) is ft. = 4 x 10~ 3 . In order 
to produce reliable statistics, a large number of runs were 
carried out for each case studied; the runs differed in the 
choice of initial random values for {9 k}- 



B. Folding to a single helix 

Measurements were made of the long-range order 
parameter S and the total energy, the latter a sum 
over contributions from the soft-sphere pair interactions, 
Eq. ([74|), the torsional terms, Eq. (|75[) , and the kinetic en- 
ergy. The measurements involved 400 independent runs 
for each of several chain lengths L and different cooling 
rates. These quantitative results were complemented by 
an interactive graphical version of the simulation pro- 
gram that provided real-time visual monitoring of the 
folding process; in addition to learning about any poten- 
tial obstructions to complete folding, the ability to ob- 
serve chains directly also helped when choosing a cooling 
rate sufficiently fast for folding to proceed to completion, 
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but not too fast for an excessive number of defects to 
become trapped in the nascent structures. 

The viability of the underlying approach depends on 
whether it can actually produce correctly structured he- 
lices. The first series of results measures the fraction of 
chains that successfully fold into a helical state, and the 
manner in which the success rate depends on L and the 
cooling rate. A summary appears in Table Q; L ranges 
from 18 to 90, which, since the helix period is six, corre- 
sponds to 3-15 full helical turns. 

Owing to the large number of runs it is not possible to 
provide a detailed history of each, so a quantitative mea- 
sure of folding success must be introduced. A successfully 
folded helix is deemed to be one for which S > 0.88 at 
least once during the last 1.2 x 10 5 steps of the run (mea- 
surements are made every 4000 steps) ; by this stage of the 
run the system has reached a comparatively low temper- 
ature, so that further substantial conformational changes 
are unlikely. Visual analysis confirms that, for the cases 
considered, this threshold for S provides a quite reliable 
estimator; it tends to be sensitive to defects in the heli- 
cal structure, while allowing for the fact that a properly 
folded helix may still has some residual curvature along 
its major axis. 

It is clear from Table | that a high success rate for he- 
lix production is achieved. Two trends are apparent in 
the results, neither of them unexpected: for a given fx, 
longer chains are less likely to fold properly than shorter 
chains, and, for a given L, a larger fx (corresponding to 
slower cooling) raises the success rate. Thus the longer 
the chain, the slower the desired cooling rate; additional 
runs with faster cooling confirm this observation. The 
longest of the chains folds to a helix with 15 turns, which, 
considering the potential for defects, represents a signif- 
icant victory of energy over entropy. 

The rate at which chains approach the helically ordered 
state can be studied by monitoring the mean values of 
S, as well as the negative of the total energy (which is 
dominated by the torsional component when in the folded 
state); these quantities provide measures of the long- and 
short-range order, respectively. The results, normalized 
per DOF, for 54- and 90-site chains, averaged over all 
400 runs, are shown in Figs. ^ and [l| The overall results 
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FIG. 3: Averaged order parameter and (negative) total energy 
per DOF as functions of time (in dimensionless MD units) for 
chains with L — 54; the contributions of chains that do and do 
not fold correctly appear in separate curves, with the upper 
curve in each case corresponding to the successful folders. 
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FIG. 4: Order parameter and energy for L — 90 (similar to 
Fig. |). 



are divided into two groups, depending on whether the 
chain is classified as having folded successfully or not, 
and error bars indicate the standard deviations of the 
measurements. In each case it is the upper curve that 
represents the average for the successfully folded chains, 
and it has the smaller error bars. 

An alternative estimate of the rate at which folding 
proceeds is based on the time dependence of the fraction 
of chains in a helical state, relative to all those that even- 
tually succeed in reaching this state. This provides infor- 
mation about when, assuming a chain folds successfully, 
the appearance of helical order actually occurs. Fig. || 
shows these cumulative distributions for the different L, 
each at the slowest cooling rate considered. The cooling 
rate clearly affects the results, as can be seen from the 
separation of the two groups of curves that are based on 



FIG. 5: Cumulative distributions of chains in the folded state 
as a function of time, for different lengths (L). 



different rates (see Table ||) ; for a given cooling rate, the 
folding speed tends to drop as the chains become longer 
(the slight crossover of the L = 72 and 90 curves is prob- 
ably not significant). 

A more detailed examination of final-state conforma- 
tions is based on histograms of the ^-distribution, using 
measurements made over the last 1.2 x 10 5 steps. These 
results appear in Fig. || for several L values (at the slow- 
est cooling rate). There are two separate curves for each 
L, one showing the spread of S for those chains that sat- 
isfied the folding criterion at least once during this mea- 
surement period, and a broader, much lower curve for 
the chains that did not. The former set of distributions 
become broader with increasing L; there are several con- 
tributing causes for this, including slower folding rates 
(all the runs were of equal length) , chains not managing 
to fold successfully but having one or more intermediate 
S values which passed the test, and the increasing effect 
of bending along the helical axis. The latter, broader dis- 
tributions (scaled up by a factor of 10 to make the details 
visible) are due both to the many defective structures 
possible, each with its own spread of S values, and also 
to the defects themselves reducing the structural rigidity 
and so raising the susceptibility to slow thermal vibra- 
tion. Only for the longest chains is there any overlap of 
the curves, and even then it is minimal. 

The best way to follow the folding process is by view- 
ing animated sequences of images taken at various points 
during the run; some sequences can actually be generated 
while running the simulation interactively, if the com- 
putations proceed sufficiently rapidly. Here, due to the 
limitations of the printed page, a selection of static im- 
ages must suffice. One could attempt a verbal descrip- 
tion of what transpires but, as was the case in pl|, there 
are no obvious features shared by the individual fold- 
ing trajectories. Even if certain common characteristics 
do exist, the strong random conformational fluctuations 
make their observation difficult; a systematic, quantita- 
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FIG. 6: Order parameter distributions; separate curves show 
results for chains that did (peaks on the right) and did not 
(multiplied by a factor of 10) fold correctly. 




FIG. 7: Correctly folded helix (L = 90) with residual curva- 
ture; this is the most extreme case of bending observed. 



tive means for identifying pathways, that extends ideas 
used for equilibrium states Q , might prove helpful in this 
task. 

Figure ^, which appeared early in the paper, shows 
an image of a typical, well-formed, almost straight helix 
obtained in one of the runs, while Fig. || shows a random 
chain configuration observed near the start of a run, both 
for L = 90 chains. For clarity, these and subsequent 
pictures represent the chains by their tubular envelopes, 
rather than by showing individual, partially overlapped 
spheres representing the monomers. The tube thickness 
corresponds to unit diameter, slightly less than the soft- 
sphere interaction cutoff to ensure that a small amount 
of space remains visible between adjacent turns of the 
helix. The maximum amount of residual curvature that 
was observed in the backbones of properly folded helices 
is apparent from Fig. Q this example actually meets the 
criterion for folding success (as indeed it should). 

While the majority of runs (85% for chains with L = 
90, and an even higher proportion for smaller L, see Ta- 
ble ||) result in a correctly folded helix, examination of the 
kinds of defects that appear in the final states of those 
runs that fail to fold properly is an informative exercise. 
The first such picture, Fig. |8|, is of an L = 90 chain with 
two helical regions separated by a single defect. The de- 
fect is essentially a single loop of the helix with a reverse 
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FIG. 8: Incorrectly folded state with a single defect. 




FIG. 9: Incorrectly folded state with two separate defects. 



fold that became frozen in place during the cooling pro- 
cess. This is the most frequent type of defect, and its 
location can be anywhere in the chain, even right at the 
end. 

Less frequent are chains with two spatially separated 
defects, as shown in Fig. ^. More extreme, but very rare 
examples of other kinds of defects appear in Figs, [h] and 
pd| . These show what can happen when the chain starts 
to become entangled with itself; in the first case the prob- 
lem is localized, but in the second (the only example of 
its kind observed) there is a relatively large loop trapped 
by the entanglement. The fact that these defects are rel- 
atively infrequent, and that even this low level of failure 
can be reduced by lowering the cooling rate still further, 
attest to the robustness and reliability of the folding pro- 
cess. 




FIG. 10: Folded state with a localized intertwined defect. 
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FIG. 11: 
loop. 



Folded state with a complex defect involving a large 
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FIG. 12: Averaged order parameter and energy as functions 
of time for chains that form antiparallel helix pairs (L — 78). 



C. Folding to a pair of helices 

The helix formation described above is obtained in a 
study of homopolymers where, due to the uniformity of 
the chain, the only kind of repeating structure that can 
be produced is the helix (the planar zigzag conforma- 
tion is a degenerate case) . In a protein context this kind 
of structure is classed as secondary because helices of- 
ten serve as structural components in more complex as- 
semblies. Globular proteins are characterized by at least 
one additional level in the structural hierarchy, namely 
tertiary structure. A model capable of demonstrating 
tertiary structure requires at least some differentiation 
among the chain sites that breaks the translational invari- 
ance. Building on the helix-forming model investigated 
here, the next stage of complexity is a packed assembly of 
helices, a structure that incorporates both the secondary 
and tertiary levels of the hierarchy. The simplest way to 
design such a structure is to include non-local, attrac- 
tive forces between selected pairs of chain sites; here, the 
pairs involved are located at chain positions that will be 
brought into proximity following the collapse into a state 
with two adjacent helices aligned in antiparallel direc- 
tions. Such highly specific interactions are reminiscent 
of an approach used for lattice protein models p8[ ; the 
overall simplicity should be contrasted with the highly 
detailed model, complete with solvent, used in an MD 



study of the unfolding of a three- helix bundle 1 29 



Choosing the interactions to produce a twin-helix 
structure is accomplished as follows: For a homogeneous 
chain with L = rib + 1 sites, in which the periodicity 
of the helix is p, the ground state consists of n t = L/p 
turns. Now assume that n t is an odd number and choose 
the interactions appropriate for a pair of adjacent helices, 
each with (n t — l)/2 turns, joined by a "bridging" chain 
segment of length p. All that remains is to identify the 
pairs of sites in the two helical regions that must attract; 
these are just neighboring sites in adjacent turns of one 
of the helical segments, matched with the corresponding 



sites, in reverse order, of the other. The strength of the 
attractive potential responsible for the tertiary structure, 
which is based on Eq. (|74|), is 0.2u(°); it is weaker than 
the torsion, but the question of whether tertiary struc- 
ture formation is the beneficiary or the cause of secondary 
structure formation [|l] is not addressed here. In these ex- 
ploratory computations, the torsional interactions along 
the bonds in the bridging segment are set to zero for sim- 
plicity; such interactions could actually be used to assist 
the folding and will be the subject of future study. 

The definition of the long-range order parameter S, 
Eq. ( |78[ ) , must be modified to reflect the structure of the 
anticipated collapsed state. The partial contributions to 
S of the two helical segments are now combined with op- 
posite signs, and contributions from the bridging region 
ignored; this provides a reasonably sensitive, but unam- 
biguous, measure of folding success. Fig. [l2] shows how 
both the modified S and the energy vary with time, for 
L = 78 chains (corresponding to a folded state consisting 
of a pair of 6-turn helices), using runs whose details are 
otherwise similar to those for L = 90. Based on visual 
analysis of the behavior, a different definition of what 
constitutes successful folding is needed here, namely that 
the value of the modified S must now exceed 0.95 for fold- 
ing to be considered successful; using this criterion the 
success fraction was found to be 0.66. 

Figure [l3] shows an example of a successfully folded 
helix pair, an ordered conformation in which both sec- 
ondary and tertiary structural elements are manifest; 
two-thirds of the runs ended in this state. The failure to 
fold properly was generally not due to defects in the in- 
dividual helical segments, but because the two secondary 
components did not succeed in aligning correctly; an ex- 
ample of such an outcome is shown in Fig. [l4| The attrac- 
tive forces become much more effective once the helical 
segments have formed; this is due to the linear arrange- 
ment of the attraction sites enabling them to function 
cooperatively, an effect that may be reflected in real pro- 



14 




FIG. 13: Well-formed pair of helices with antiparallel align- 
ment. 




FIG. 14: Pair of helices that have failed to align. 

teins with prominent secondary-structural features. As 
a result of the nature of the interactions and the rela- 
tive interaction strengths, the helical segments form first, 
essentially unimpeded, and only then do they attempt 
to align. The failure to align here is a symptom of the 
absence of any driving force for bringing the helices to- 
gether; there is no torsional preference in the bridging 
segment and the range of the inter-helix attraction is too 
short to be felt if the helical segments are well separated. 
Changes to either or both these aspects of the potential 
should alter the behavior, but care is required to avoid 
hindering the helix formation process in any way. 

V. CONCLUSIONS 

The present paper has focused on both methodology 
and results. A formalism developed for the dynamics of 
robotic manipulators and other coupled mechanical sys- 
tems - that provides a convenient and direct representa- 
tion of the dynamics of bodies connected by rigid links 
with restricted degrees of freedom - has been utilized in a 
polymer context, with the clear implication that existing 
methods based on geometric constraints may be redun- 
dant in many instances. Since the treatment also in- 
volves dealing with rigid-body dynamics, a computation- 
ally more effective method than the often-used quater- 
nion approach is also employed. 



The results of an extensive series of MD simulations 
demonstrate that homopolymer chains with suitable tor- 
sional interactions consistently collapse into well-formed 
helices; the probability of localized defects being frozen 
into the structure depends on the cooling rate, and it 
can be reduced to a very low level by cooling sufficiently 
slowly. In order to demonstrate that the present sim- 
plified approach is relevant for protein folding, heteroge- 
neous chains, with interactions favoring the development 
of antiparallel pairs of helices, were shown to produce 
coexisting secondary and tertiary structural features. 

The order parameters introduced to quantify the de- 
gree of folding were tailored to capture the structural 
order present in the final state of the polymer. To study 
the details of folding pathways, other order parameters 
(or reaction coordinates) that capture features present in 
the intermediate states, but not necessarily in the final 
state, could be defined. In the twin-helix case, for exam- 
ple, a simple sum of the absolute values of S, evaluated 
separately for each helix-forming segment of the chain, 
might prove useful, since this quantity reaches its maxi- 
mum upon completion of secondary structure formation, 
and is not seriously affected by subsequent rearrangement 
at the tertiary level. 

The apparent success of the MD approach to chain 
folding used here is important for another reason. The 
widely-cited Levinthal "paradox" Q] implies that since 
the number of states accessible to a protein grows expo- 
nentially with residue count, the time required for even a 
small protein to seek out its native state is, for practical 
purposes, infinite. Since nature does not suffer from this 
problem, the implication is that substantial portions of 
the folding process occur along certain well-characterized 
pathways; thus the molecules do not really wander almost 
aimlessly through conformation space, and hence there is 
no paradox. In order to begin to simulate such processes 
it is necessary to resort to a computationally efficient 
model, with realistic dynamics and a unique but readily 
determined low-energy "native" state; this is precisely 
what has been accomplished in the present work. 

The type of model introduced here provides a starting 
point for exploration in several directions. While the in- 
teractions were weighted to construct the secondary helix 
structure prior to forming features at the tertiary level, a 
change in the relative strength of the interactions would 
allow aspects of both levels of organization to appear con- 
currently. Chains could be designed to fold into other 
idealized compact structures, such as the packed cube 
used in some lattice studies jl], |), or sheetlike confor- 
mations that also represent important secondary struc- 
ture components; furthermore, packed states with differ- 
ent degrees of accessibility could provide useful informa- 
tion on how this feature influences folding success. The 
interactions can be modified and new types of interac- 
tions added; polymer topology can be changed by the 
addition of side chains corresponding to residues with 
extended structure. Common to all these enhancements 
is that the model must always be designed with a known 
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lowest-energy state, and in this respect the approach dif- 
fers from many other types of protein simulation. While 
models of this kind are perhaps limited in the kinds of 
questions they can address, there are more than enough 
issues requiring attention where they can prove helpful. 

In a sense, the role played by such highly simplified 
models is analogous to the Ising model of ferromagnetism 
pof ; while it is not usually claimed that an Ising spin sys- 
tem accurately represents a real magnetic material (or, 
for that matter, any other kind of real physical system, 
when used for other kinds of problems such as lattice 
gases) it does however capture a great deal of the essence 
of the problem, to an extent that the study of Ising and 
related models has resulted in important advances, both 
for spin systems in particular, and for statistical mechan- 
ics and critical phenomena in general. Proteins can also 



be modeled with a high level of detail and specificity, but 
the tradeoff is that only short trajectory segments can be 
followed with an investment of a reasonable amount of 
computing effort; hopefully, extensive studies of simpli- 
fied polymer models of the kind examined here, in which 
the design is tailored to reproduce certain generic aspects 
of macromolecular behavior, will achieve greater popular- 
ity as their usefulness becomes established. 
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